Frequency Matrix Approach Demonstrates High Sequence Quality in Avian BARCODEs and Highlights Cryptic Pseudogenes

نویسندگان

  • Mark Y. Stoeckle
  • Kevin C. R. Kerr
چکیده

The accuracy of DNA barcode databases is critical for research and practical applications. Here we apply a frequency matrix to assess sequencing errors in a very large set of avian BARCODEs. Using 11,000 sequences from 2,700 bird species, we show most avian cytochrome c oxidase I (COI) nucleotide and amino acid sequences vary within a narrow range. Except for third codon positions, nearly all (96%) sites were highly conserved or limited to two nucleotides or two amino acids. A large number of positions had very low frequency variants present in single individuals of a species; these were strongly concentrated at the ends of the barcode segment, consistent with sequencing error. In addition, a small fraction (0.1%) of BARCODEs had multiple very low frequency variants shared among individuals of a species; these were found to represent overlooked cryptic pseudogenes lacking stop codons. The calculated upper limit of sequencing error was 8 × 10(-5) errors/nucleotide, which was relatively high for direct Sanger sequencing of amplified DNA, but unlikely to compromise species identification. Our results confirm the high quality of the avian BARCODE database and demonstrate significant quality improvement in avian COI records deposited in GenBank over the past decade. This approach has potential application for genetic database quality control, discovery of cryptic pseudogenes, and studies of low-level genetic variation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens

DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large-scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next-generation sequencers, which are capable of producing millions of sequenc...

متن کامل

Identifying the Main Mosquito Species in China Based on DNA Barcoding

Mosquitoes are insects of the Diptera, Nematocera, and Culicidae families, some species of which are important disease vectors. Identifying mosquito species based on morphological characteristics is difficult, particularly the identification of specimens collected in the field as part of disease surveillance programs. Because of this difficulty, we constructed DNA barcodes of the cytochrome c o...

متن کامل

Molecular phylogeny of some avian species using Cytochrome b gene sequence analysis

Veritable identification and differentiation of avian species is a vital step in conservative, taxonomic, forensic, legal and other ornithological interventions. Therefore, this study involved the application of molecular approach to identify some avian species i.e. Chicken (Gallus gallus), Muskovy duck (Cairina moschata), Japanese quail (Coturnix japonica), Laughing dove (Streptopelia senegale...

متن کامل

Sequence Analysis and Phylogenetic Study of Hemagglutinin Gene of H9N2 Subtype of Avian Influenza Virus Isolated during 1998-2002 in Iran

Sequence analysis and phylogenetic study of hemagglutinin (HA) gene of H9N2 subtype of avian influenza virus isolates (outbreaks of 1998-2002) in Tehran province (Iran) were studied. Two sets of forward and reverse primers in highly conserved regions, based on sequences of HA gene in Genbank, were designed. PCR products of a 430-bp fragment of 16 isolates were sequenced and then were aligned wi...

متن کامل

Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster?

We have amplified and sequenced the gene encoding Esterase-P (Est-P) in 10 strains of Drosophila melanogaster. Three premature termination codons occur in the coding region of the gene in two strains. This observation, together with other indirect evidence, leads us to propose that Est-P may be a pseudogene in D. melanogaster. Est-P would be a "cryptic" pseudogene, in the sense that it retains ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012